Picture for Yu Su

Yu Su

Confidence in Large Language Model Evaluation: A Bayesian Approach to Limited-Sample Challenges

Add code
Apr 30, 2025
Viaarxiv icon

MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools

Add code
Apr 28, 2025
Viaarxiv icon

Completing A Systematic Review in Hours instead of Months with Interactive AI Agents

Add code
Apr 21, 2025
Viaarxiv icon

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

Add code
Apr 09, 2025
Viaarxiv icon

An Illusion of Progress? Assessing the Current State of Web Agents

Add code
Apr 02, 2025
Viaarxiv icon

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Add code
Mar 31, 2025
Viaarxiv icon

Towards Understanding Graphical Perception in Large Multimodal Models

Add code
Mar 13, 2025
Viaarxiv icon

Building Machine Learning Challenges for Anomaly Detection in Science

Add code
Mar 03, 2025
Viaarxiv icon

From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

Add code
Feb 20, 2025
Viaarxiv icon

Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents

Add code
Feb 19, 2025
Viaarxiv icon